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Abstract. This paper shows that class inheritance, viewed as a mech- 
anism for composing self-referential namespaces, is a broadly applica- 
ble concept. We show that several kinds of software artifacts can be 
modeled as self- referential namespaces, and software tools based on a 
model of composition of namespaces can effectively manage these ar- 
tifacts. We describe four such tools: an interpreter for compositionally 
modular Scheme, a compositional linker for object files, a compositional 
interface definition language, and a compositional document processing 
tool. We show that these tools benefit significantly from incorporating 
inheritance-based reuse. Furthermore, the implementation of these tools 
share much in common since they are based on the same underlying 
model. We describe a reusable 00 framework for efficiently construct- 
ing such tools. Three of the above tools were built by directly reusing 
the apphcation framework, and the fourth evolved in parallel with it. 
We provide reuse statistics and experiences with the development of our 
framework and its completions. 



1 Introduction 

Inheritance of classes in object-oriented programming has been touted for en- 
abling significant levels of implementation reuse. Inheritance is widely acknowl- 
edged to support reuse via incremental programming — one needs to only pro- 
gram how new classes differ from already existing ones. 

One characterization of class-based inheritance is that it is the combination 
of self-referential namespaces [12]. By carefully designing operations to manip- 
ulate such namespaces, a wide spectrum of effects of single and multiple inher- 
itance can be obtained. Compositional modularity [3, 6] is such an inheritance 
model, in which self-referential namespaces, known as modules, can be adapted 
and composed in various ways to achieve implementation reuse. Compositional 
modularity supports a stronger and more flexible reuse model than traditional 
class-based inheritance. 

* This research was sponsored by the Defense Advanced Research Projects Agency 
under contract number DABT63-94-C-0058, and by the Office of Naval Research 
under grant number N00014-95-1-0737. 



When class- based inheritance is distilled down to a notion of operations on 
self-referential namespaces, it becomes possible to explore the breadth of applic- 
ability of the concept of inheritance. There is indeed a wide range of software 
artifacts that can be modeled as self-referential namespaces. For instance, it is 
well known that interface types can be viewed as self-referential namespaces 
[8]. A traditional compiled object file can also be viewed as a self-referential 
namespace. Furthermore, structured document fragments can be modeled as 
self-referential namespaces. Even other artifacts, such as GUI components and 
file system directories can be regarded as self-referential namespaces. 

There currently exists a range of tools that manage the range of artifacts 
mentioned above. However, many such tools are usually baised on disparate, 
and often impoverished, underlying models. In this paper, we argue that it is 
advantageous to manage the above artifacts from the viewpoint of a well under- 
stood model such as compositional modularity, and design tools based on this 
viewpoint. The primary advantage of such an approach is that the underlying 
model of such tools can be significantly enriched, and reuse mechanisms akin 
to inheritance can be supported on the artifacts they manage. Moreover, the 
uniformity of the underlying model of such tools can be exploited to support 
better interactions between them. 

The model of compositional modularity can be easily and effectively ap- 
plied within tools that manipulate artifacts such as the ones given above. To 
demonstrate, we describe four such tools in this paper: (i) an interpreter for 
a compositional module system for the Scheme programming language, (ii) a 
linker that manipulates compiled object files as compositional modules, (iii) a 
compiler front-end for an interface definition language with compositionctl inter- 
faces, and (iv) a document processing system that manipulates documents cis 
compositional modules. We also discuss other tools that could be based on com- 
positional modularity. We show that tools such as the above derive important 
benefits from incorporating compositional modularity. 

Naturally, the implementations of these tools share much in common, since 
they are all based on compositional modularity. It is therefore beneficial to ab- 
stract their common aspects, and realize them as a reusable software architec- 
ture. We have designed an 00 application framework known as Etyma that 
encompasses the reusable architecture of tools based on compositional modu- 
larity. The primary utility of the Etyma framework is that it enables one to 
easily and rapidly build module composition engines for tools that manipulate 
a variety of compositional modules. Etyma consists of more than 40 reusable 
C-I-+ classes. In this paper, we document the architecture of Etyma using de- 
sign patterns and describe the construction of three of the above four tools as 
direct completions of the framework. We report that significant design and code 
reuse (between 73 and 91%) was obtained in the construction of the above pro- 
totypes as completions of the framework. We also outline our experience with 
the iterative development of the framework. 

The following section provides some background on the semantic foundations 
of compositional modularity (also referred to as CM for short). Subsections of 



Section 3 describe each of the four compositional tools mentioned above. In 
particular, Subsection 3.1 describes the CM model via examples in a Scheme 
based language; this subsection is intended as an extended introduction to CM. 
Section 4 then presents the architecture, class design, and reuse statistics for the 
Etyma framework and its completions. 

2 Background and Related Work 

Based on the notion of operations on records developed by Cardelli and others 
[9], Cook and Palsberg [12] modeled a class cis a self-referential record generating 
function, also known cis a generator. For example, the generator g = Xs, {ai = 
Vi, 02 = 1^2, . . . , On = fn} ^as method names Oi . . .an bound to method bodies 
Vi . . .Vn. The parameter s corresponds to the generator's notion of "self." Ref- 
erences to names from within the method bodies are made via the s parameter, 
e.g., SMi, and hence are known as self-references. The fixpoint Y{g) of such a 
generator, a record, corresponds to an instance of the class g. Taking the fixpoint 
of the generator binds the generator's self-references s.ax. 

The notion of class inheritance is modeled as combination of generators, 
via operators such as merge and override. For instance, the notion of method 
overriding for generators, override, is defined in terms of record overriding (^— ^ 
denotes the record override operator): 

override = Xgi. Xg2. As. gi{3) <— r g2{s) 

The crucial aspect of inheritance is that of self-reference manipulation — 
while combining classes during inheritance, a superclass' notion of self must be 
properly modified to include that of the subclass. This is captured by the above 
definition. 

Based on Cook's work, Bracha and Lindstrom [6] developed a uniform and 
comprehensive suite of linguistic operations on a simple notion of classes known 
as modules^ also modeled as generators. These operations individually achieve 
effects of rebinding, sharing, encapsulation, and static binding. In addition to 
making previously existing operators explicit linguistic constructs, they define 
three new operators: hide, freeze, and copy- as. For example, a method of a gener- 
ator can be copied under another name in order to achieve access to overridden 
methods, cis follows (||r denotes the record merge operation): 

copy-as ah = Xg, Xs, let super — g{s) in super \\r {6 = super.ro} 

In [3], we further augment the above model to include a notion of hierarchical 
nesting as a composition operation, arguing that module nestability and separate 
development must co-exist in a modularity framework without compromising 
each other. This requires abstracting the environment of a generator, resulting 
in what we call a closed generator, e.g., gc = Ae. Xs, {oi = vi, 02 = 1^2, . • . » — 
Vn}. Environmental references from within method bodies are made via a sepa- 
rate e parameter. With this, separately developed modules can be retroactively 



nested into conforming modules via a composition operator named nest, defined 
as follows: 

nest n = Xgc^^. >^9c,^i' Ae. Xs. {n - Xd. gc,^{e s)} \\r 9c,^,{e){s) 

Compositional Modularity. The above concept of closed generators, along with 
eight primary operations on them, nnerge, override, rename, copy-as, restrict, 
freeze, hide, and nest, within an imperative store-based framework with ap- 
propriate static typing rules comprise the model of compositional modularity 
[3]. The term composition is used here to mean implementation composition to 
achieve reuse akin to inheritance. The goal of CM is to get maximal reuse out 
of small, composable components. The composition constructs given above pro- 
vide a powerful framework for building larger modules from smaller ones. These 
constructs can be used in combination to emulate various composite inheritance 
idioms in existing 00 languages. As a result, CM supports a stronger (by virtue 
of compositional nesting) as well as a more flexible (by virtue of "unbundled," 
composable operators) notion of reuse than traditional inheritance models. 

3 Systems based on Compositional Modularity 

To provide a better understanding of how one can apply CM within various 
tools, we describe four systems based on CM in this section. As mentioned 
earlier, CM can be layered on top of systems that have a notion of self-referential 
namespaces and some benefit to be derived from composing them. For systems 
that have these characteristics, a software tool that manipulates namespaces 
using operations of CM can be constructed. However, it must be pointed out 
that not all eight of the CM operations may be useful or even possible within 
every system. Nevertheless, we will show in the following sections that enriching a 
system by incorporating CM gives rise to specific benefits relating to the system's 
expressive power, flexibility, and/or scope. 

3,1 CMS 

The first obvious choice for applying compositional modularity is within a mod- 
ular programming language. In this section, we describe via examples a module 
system based on CM for the programming language Scheme [10], which we call 
Compositionally Modular Scheme, or CMS for short. 

A module is generally understood to be an independent namespace. A Scheme 
module may be modeled as a self-referential namespace, as follows. A Scheme 
module may be regarded as a set of symbols (identifiers) bound either to loca- 
tions (variables) or to any of the various Scheme values, including procedures. 
Procedures may contain self-references to other names defined within the mod- 
ule, or to unbound names within the module which correspond to "abstract 
methods." (In more traditional module systems, unbound names might corre- 
spond to the notion of imported names, with the actual importation performed 
via module combination, described below.) 



Several module systems for Scheme have been proposed previously [13, 26, 
24], but these systems mainly provide a facility for structuring programs via 
decomposition. However, the ability to recompose first-class modules can addi- 
tionally support design and implementation reuse akin to inheritance in 00 
programming. Furthermore, the notion of first-class modules and their oper- 
ations in CM is consistent with the uniform use of first-class values and the 
expression-oriented nature of Scheme. Consequently, we argue that the incorpo- 
ration of CM into a module system for Scheme can be very beneficial. (There 
is previous work on Scheme module systems based on reflective operations on 
first-class environments [18]; however, the CMS module system is different in its 
approach and scope, please see [3].) 

Module definition and encapsulation. A module in CMS is a Scheme value that 
is created with the mk-module primitive. It consists of a set of attributes (symbol- 
binding pairs) with no order significance. Attributes that are bound to proce- 
dures are referred to as methods^ borrowing from 00 programming. Modules 
may be manipulated, but their attributes cannot be accessed or evaluated until 
they are instantiated via the mk-instarice primitive. The attributes of a module 
instance can be accessed via the attr-ref primitive, and cissigned to via the attr- 
set! primitive, A method can access other attributes within its own instance via 
analogous primitives: self-ref and self-set!. 

Figure 1 (a) shows a simple module with three attributes bound to a Scheme 
variable fueled-vehicle. Note that the fill method refers to an attribute capacity 
that is not defined within the module, but is expected to be the fuel capacity of 
the vehicle in gallons. 

The primitive hide retroactively encapsulates its argument attribute. In Fig- 
ure 1 (b), the hide expression returns a new module with an encapsulated fuel 
attribute that has an internal, inaccessible name, shown by the describe primitive 
as <priv-attr>. 

Module combination. The module capacity-module given in Figure 1 (c) exports 
two symbols, including one named capacity. Thus, the module encap-fueled- 
vehicle can be combined with capacity-module to satisfy the former's "import" 
requirement, via the primitive merge. The new merged module vehicle in 1 (c) 
contains four public attributes: empty?, fill, capacity, and greater-capacity?. 

The primitive merge does not permit combining modules with conflicting 
defined attributes, i.e., attributes that are defined to have the same name. In 
the presence of conflicting attributes, one can use override, which creates a new 
module by choosing the right operand's binding over the left operand's in the 
resulting module. For example, the module new-capacity in Figure 1 (d) cannot be 
merged with vehicle since the two modules have a conflicting attribute capacity. 
However, new-capacity can override vehicle, as shown. 

Module adaptation. Besides hide, there are four other primitives which can be 
used to create new modules by adapting some aspect of the attributes of exist- 
ing modules. The primitive restrict simply removes the definition of the given 



(a) 


(define fueled-vehicle (mk-module 
((fuel 0) 

(empty? (lambda () (= (self- ref fuel) 0))) 

(fill (lambda () (self-set! fuel (self-ref capacity))))))) 


(b) 


(define encap-fueled-vehicle (hide fueled-vehicle 'fuel)) 
(describe encap-fueled-vehicle) 

((empty? (lambda () (= (self-ref <priv-attr>) 0))) (fill . . . )) 


(c) 


(define capacity- module 

(mk-module ((capacity 10) 

(greater-capacity? (lambda (in) 

(> (self-ref capacity) (attr-ref in capacity))))))) 
(define vehicle (merge encap-fueled-vehicle capacity-module)) 


(d) 


(define new-capacity (mk-module ((capacity 25)))) 
(define new-vehicle (override vehicle new-capacity)) 



Fig. 1. Basic module operations, (a) Definition via mk-module, (b) Encapsulation via 
hide, (c) Combination via merge, and (d) Rebinding via override, 

(defined) attribute from the module, i.e., makes it undefined (see Figure 2 (a)). 
The primitive rename changes the name of the definition of, and self-references 
to, the attribute in its second argument to the one in the third argument. An 
undefined attribute, i.e., an attribute that is not defined but is self-referenced, 
can also be renamed. An example is shown in Figure 2 (b). 

The primitive copy-as copies the binding of the attribute in its second argu- 
ment (which must be defined) with the name in its third argument. An example 
is shown in Figure 2 (c). The primitive freeze statically binds self-references to 
the given attribute, provided it is defined in the module. Freezing the attribute 
capacity in the module vehicle causes self-references to capacity to be statically 
bound, but the attribute capacity itself is available in the public interface for 
further manipulation, e.g., rebinding by combination. As shown in Figure 2 (d), 
frozen self-references to capacity are transformed to refer to a private version of 
the attribute. 

Module nesting. In CMS, modules may be nested within other modules by bind- 
ing them to attributes, as in modules typel and type2 within vehicle-category in 
Figure 3 (a). Nested modules may refer to name bindings in their surrounding 
module via the env-ref primitive. Additionally, a seperately developed module 
may be retroactively nested within another module via the operator nest. An 
example is shown in Figure 3 (b). The nest expression in the example produces 
a module that contains the attribute type3 bound to the nested module veh-type 
just as if it was directly lexically nested. 



(a) 


(describe (restrict vehicle 'capacity)) 

((fill ...) (empty? ...) (greater-capacity? ...)) 


(b) 


(describe (rename vehicle 'capacity 'fuel-capacity)) 
((fuel-capacity 10)(fill ... (setf-ref fuel-capacity))...) 


(c) 


(describe (copy-as vehicle 'capacity 'default-capacity)) 

((capacity 10)(default-capacity 10)(fill ...(self-ref capacity))...) 


(d) 


(describe (freeze vehicle 'capacity)) 

((capacity 10)(fill ...(self-ref <priv-attr>)) ...) 



Fig. 2. Adaptation, (a) Removing an attribute via restrict (b) Renaming an attribute 
and self- references to it via rename (c) Copying an attribute via copy-as, and (d) Stat- 
ically binding self- references to an attribute via freeze. 



(a) 


(define vehicle-category 
(mk-module 

((capacity 10) 

(typel (mk-module ((fill (lambda. . . (env-ref capacity). . , ))))) 
(type2 (mk-module ((fill (lambda.. . (env-ref capacity). . . )))))))) 

(define mycategory (mk-instance vehicle-category)) 

(define vl (mk-instance (attr-ref mycategory typel))) 


(b) 


(define veh-type (mk-module ((fill (lambda . . . (env-ref capacity) . . . ))))) 
(define new- vehicle-category (nest 'type3 veh-type vehicle-category)) 



Fig. 3. Nested Modules, (a) Lexical nesting, and (b) Retroactive nesting via the nest 
operator. 

Composite Inheritance, With the above suite of primitives, several composite 
inheritance idioms including super-based and prefix-based single inheritance, as 
well as mixin-based and general forms of multiple inheritance with various types 
of conflict resolution and sharing strategies can be emulated; please see [3] for a 
detciiled description. To give some insight, Figure 4 pictorially shows how super- 
based and prefix- based single inheritance can be emulated using CM primitives. 
Figure 4 (a) shows a "superclass" super with a method meth and self-references 
to it. An increment delta heis a redefinition of meth in terms of the previous 
definition, referred to as old, as well as some self-references to meth. The classes 





Fig. 4. Pictorial representation of subclassing with single inheritance. Expressions for 
obtaining sub are: (a) Super-based: (hide (override (copy-as super 'meth old) delta) 
'old), and (b) Preiix-based: (hide (override (copy-as delta 'meth 'new) (rename super 
'inner 'new)) 'new). 



super and delta can be combined to form the "subclass" sub by using the sequence 
of operators copy-override-hide shown in the figure caption. Similarly, the BETA- 
style [20] prefixes super and delta in Figure 4(b) can be combined into sub using 
a similar sequence of operations. The difference is that (an adapted version of) 
the superclciss overrides the increment in the case of prefix- based inheritance, as 
opposed to the reverse for super-based inheritance. Indeed, that is the difference 
between the two forms of single inheritance. 

Two idiomatic sequences of operations in CM have proven to be very useful: 
copy-override-hide, and rename-merge-hide. These and other idioms of CM will 
be shown cis we proceed. 



3.2 Compositional Linking 

In this section, we describe the second of the four tools based on CM: a pro- 
grammable linker. 

The physical notion of a separately compiled object file may be modeled 
logically as a self- referential namespace. An object file essentially consists of a set 
of symbols, each associated with data or code. This set of symbols is represented 
as a symbol table within the object module. Furthermore, there are internal 
self-references to these symbols which are represented as relocation information 
within the object module. 

The traditional notion of linking object files essentially corresponds to the 
merge operation in CM. However, the full power of CM made available via a pro- 
grammable linker can significantly enhance the ability to manage and bind ob- 



(open- module {path- string- expr)) 

(merge {module- expr 1) {module- expr2) ...) 

(override {module- expr 1) {module- expr2) ...) 

(copy- as {module-expr) {from-name-expr) {to- name- expr)) 

(rename {module-expr) {from-name-expr) {to-name-expr}) 

(hide {module-expr) {sym- name- expr)) 

(restrict {module-expr) {sym-name-expr)) 

(fix {section-locn-list) {module-expr)) 

Fig. 5. Syntax of some OMOS module primitives. 

ject modules. In particular, facilities such as function interposition, management 
of incremental additions of functionality to compiled libraries, and namespace 
management can be made more principled and flexible, as shown below. Con- 
sequently, there is much to gain from incorporating CM into a programmable 
linking tool. 

A programmable linker. OMOS [23, 5] is a programmable linker that supports 
CM for C language object files. OMOS is programmed using a Scheme based 
scripting language similar to CMS above, except that the modules manipulated 
in this language are compiled object files (dot-o files) as opposed to Scheme 
modules. A dot-o can be converted into a first-class compositional module via a 
primitive open-module, manipulated using the CM primitives, and instantiated 
into executable programs (bound to particular points in a process' address spsice) 
using the primitive fix. The syntax of some OMOS module primitives is shown 
in Figure 5. 

Implementationally, most module operations transform the symbol table of 
the object file. For instance, the restrict operation essentially modifies a symbol 
table entry to indicate that the symbol is only declared (extern) and not defined. 
The hide operation removes a definition from the external interface of the object 
file, i.e. makes the definition static. Similarly, the renanne and copy-as operations 
modify symbol table entries. The primitive nest is not supported by OMOS, 
since the notion of nesting is not supported by the base language, C. 

Wrapping, To illustrate the use of the above primitives, this section describes 
how to achieve several variations of a facility generally referred to as "wrapping." 
Figure 6 shows a C language service providing module LIB with a function f(), 
and its client module CLIENT that calls f(). (Although OMOS really operates 
on compiled dot-o files, the C source for modules is shown in the figure for 
illustration purposes.) Three varieties of wrapping can be illustrated with the 
modules shown in the figure. 

(1) A version of LIB that is wrapped with the module LWRAP so that all 
accesses to f() are indirected through LWRAP's f() can be produced with the 
expression: 



LWRAP 
(Wrapper) 

extern void f_old(); 

void f{) { 
LoldO; 
r ... */ 

} 



CWRAP 
(Wrapper) 

extern void f(); 
void stub() { 
f(): 

r ... V 
)_ 

Fig. 6. Modide definitions for wrapping examples in OMOS. 

(hide (override (copy-as LIB 'f T_old) LWRAP) 'fjold) 

By using copy-as instead of renanne, this expression ensures that self-references 
to f() within LIB continue to refer to (the overridden) f() in the resultant, and 
are not renamed to f_old. 

(2) Alternatively, a wrapped version of LIB in which the definition of and 
self- references to f() are renamed can be produced using the expression: 

(hide (merge (rename LIB 'f 'f^ld) LWRAP) 'f-old) 

This might be useful, for example, if we want to wrap LIB with a wrapper 
which counts only the number of external calls to LIB's f(), but does not count 
internal calls. 

(3) If we want to wrap all calls to f() from CLIENT so that they are mediated 
via the stub() function of module CWRAP, we can use the following expression: 

(hide (merge (rename CLIENT 'f 'stub) CWRAP) 'stub) 

Note that in this last case, only a particular client module is wrapped, without 
wrapping the service provider. In the example, renaming the client module's calls 
to f() produces the desired effect, since the declaration of f() as well as all self- 
references to it must be renamed. 

The idioms given above are in fact the basis of inheritance in 00 program- 
ming. Scheme macros that perform various kinds of single and multiple inheri- 
tance can be used within OMOS just as in CMS. In [5], we describe an archi- 
tecture for 00 application development via programmed linkage using OMOS. 
Specifically, we show how to manage extensions to libraries, how to generate 



LIB 

(Service Provider) 



void f() { 

r .... */ 
f(): 

r .... */ 

} 



CLIENT 
(Client Program) 

extem void f(); 
void g() { 
fO: 

r .... V 

} 



static constructors and destructors, and how to manage the problem of flat 
namespaces with dot-o files generated from the C langauge. 

3.3 Interface Composition 

In this section, we describe the third of the four tools based on CM: a composi- 
tional interface definition language. 

An interface is essentially a naming scope, with labels bound to types. In the 
case of recursive interfaces, type constituents of the interface may recursively 
refer back to the interface itself [8]. Thus, an interface can be modeled as a 
self-referential namespace. 

Explicit specification and composition of interfaces is becoming widespread in 
modern programming languages and distributed systems [22, 2, 19], particularly 
in interface definition languages (IDLs). It is useful to specify an interface by 
reusing, i.e., inheriting from, existing interfaces. Reuse facilitates the evolution 
of interfaces [17] by ensuring that inheriting interfaces evolve in step with the 
inherited interfaces. It also simplifies maintenance by reducing redundant code. 
Most importantly, an IDL should be able to express the types of components 
generated via implementation inheritance in module implementation languages. 
In fact, it has been shown that inheritance of interfaces generates exactly those 
types, known as inherited types, that correspond to the types of inherited ob- 
jects [11]. These reasons point to the need for flexible interface inheritance (or 
composition) mechanisms in IDLs. 

A compositional IDL, We have developed a compositional IDL to demonstrate 
the concepts of compositionality of interfaces. The base type domain of the lan- 
guage consists of primitive types, function types, and record types. Interfaces 
in this language follow a structural type discipline. Interfaces can be recursive, 
in that a type constituent can use the keyword self type to refer to its own in- 
terface. Furthermore, we take the analogy between interface type constituents 
and methods of objects so far as to allow interface type constituents to refer 
to sibling type constituents by selecting on selftype [9]. For example, consider a 
Point module that contains attributes corresponding to rectangular coordinates 
X and y, a method move for changing the position of the point, and an equality 
predicate equal. Its interface may be expressed as follows, where recursion is ex- 
pressed using the selftype keyword. (As a convenience, selftype.x is abbreviated 
to X.) 

interface FloatPointType { 
float X, y; 

selftype move (x, y); 
boolean equal (selftype); 

} 

Inheritance is an operation on self-referential structures, thus it can be ap- 
plied to interfaces cis well. For instance, the interface FloatPointType above can 
be extended to have a color attribute using the merge operation, as follows: 



interface ColorType { 
color-type color; 

}: 

interface ColorPointType = FloatPointType merge ColorType; 

Although it inherits from FloatPointType, the ColorPointType interface is not 
a subtype of FloatPointType, due to the contravariance of the equal method. 
However, ColorPointType shares the same structure as FloatPointType, hence it 
is known as an inherited type of FloatPointType [8, 7]. 

An important point to note here is that the merge operation on interfaces 
generates types that correspond to the types of inherited module implementa- 
tions generated via both the nnerge and override operations on module imple- 
mentations. An override operation is defined on interfaces as well, by which type 
constituents of interfaces may be arbitrarily rebound. The primary motivation 
for including such an operator is to support a high degree of reuse of existing 
interface specifications. In the following example, the x and y constituents of 
FloatPointType are reboimd to complexjtype; note that this will automatically 
result in the proper type for the move constituent, due to self-reference. 

interface ComplexPointType = 
FloatPointType override 
interface { 
complex-type x, y; 

}; 

Type constituents may be rename'd, which results in self-references to get 
renamed as well. This is useful for resolving name conflicts while performing 
operations equivalent to multiple inheritance. Furthermore, particular interface 
constituents may be project'ed. This operation is analogous to the one in rela- 
tional algebra, and is the dual of the restrict operator presented earlier. 

The operator copy-as does not seem very useful in the context of interfaces. 
Also, the operators freeze and hide do not apply, since interfaces by definition 
represent the public types of modules. An operation corresponding to nest may 
be supported, but we are doubtful as to whether that level of expressiveness is 
useful in IDLs. 

3.4 Compositional Document Processing 

In this section, we describe the fourth of the four tools based on CM: a compo- 
sitional document processing system. 

A structured document may be viewed as a compositional module. Sections 
within the document correspond to module attributes, with each section com- 
prising a label, associated section heading, and some textual body. Cross refer- 
ences within text to other section labels correspond to self-references. Thus, the 
document can be regarded as a self-referential namespace. 

A Icirge and complex document is often broken down into and composed 
from smaller pieces. In such scenarios, there are many cases where documents 



developed for one purpose can be reused for other purposes. For example, a re- 
port, such as a user manual, can be composed from several document fragments, 
such as design documents. A specific scenario of modular document process- 
ing that motivated the document processing application of CM was document 
generation and consumption in the activity of building construction such as 
that described in [25]. Building architects routinely extract and maintain large 
bases of document fragments that they reuse, edit, and compose into architec- 
tural specifications for delivery to particular clients. As another example, in a 
document centered industrial process, document fragments are generated at all 
phases of the process with the objective of producing a number of reports such 
as inventory statement, parts catalog, assembly reports, process monitoring and 
quality control documents, etc. Thus, effective document composition tools can 
be useful in enterprises where several documents are generated, edited, com- 
posed, maintained, and delivered in various ways. In such environments, the 
model of compositional modularity can be used to enhance the composability 
and reusability of documents. 

A tool for composing document modules. We have developed a programmable 
document processing system based on CM named MlfeX which can help a docu- 
ment preparer to adapt and compose documents effectively. It is built on top of 
a restricted version of the BTgX document preparation system [21], An Ml^ 
program is a script based on Scheme (as in CMS) that describes how MgX 
document modules should be constructed and composed. 

An MT^gX module is modeled as a generator of an ordered set of sections, each 
of which is a label bound either to a section body, or to a nested module. The 
section label is a symbolic name that can be referenced from other sections (de- 
fined using MgX's Mabel command). The section body is a tuple (H^B) where 
H is text corresponding to the section heading, and B corresponds to the actual 
text body, which consists of textual segments interspersed with self-references 
to labels. Given this model of document modules, consider the meaning of the 
operations of compositional modularity. 

The binary operator merge produces a new document module with the sec- 
tions of its right module operand concatenated to its left module operand, if there 
are no conflicting labels between the two module operands. Since the order of 
sections is significant, merge is associative, but not commutative. The binary 
operator override concatenates two modules in the presence of conflicting section 
labels. Conflicting sections in the right operand replace corresponding ones in 
the left operand. Non-conflicting sections in the right operand are appended to 
the left operand in the same order that they occur in the right opercind. 

The restrict operator has the usual meaning of removing sections. However, 
its dual operator project (analogous to relational algebra) is potentially more 
useful in the context of document composition. The operators rename and copy- 
as have the usual meaning. We have chosen not to support encapsulation, i.e., 
the hide operator, and static binding, i.e., freeze, although it could conceivably 
have some natural meanings for some applications of document processing. 

Hierarchical nesting is a very important and useful notion in document struc- 
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(merge (cl-project Ml '(Lll L12 ...)) 
(merge (cl-project M2 '(•. )) 

(cl-project Mn '(•. ))•••)) 


(b) 


(let (m (mk-module ())) 
(nest 'ml "Ml-heading" (cl-project Ml '(Lll L12 ...)) m) 

(nest 'mn "Mn-heading" (cl-project Mn '(Lnl Ln2 ...)) m)) 



Fig. 7. ExELinple of report generation. 
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turing. The nest operator supports retroactive nesting of document modules. 
However, in keeping with the generator semantics of CM^ an environmental ref- 
erence within a nested module is resolved to a definition of the name in the 
innermost enclosing module. While this semantics does not permit references 
from a section to non-enclosing modules, it has the potential to produce highly 
structured documents. Finally, the notion of instantiating document modules 
interestingly corresponds to running the M^jX document processing system on 
them. 

Report Generation. To illustrate some of the above notions, consider the exam- 
ple in Figure 7. At the top of the figure is shown a set of document fragments 
labeled Ml through Mn. Each of these fragments has several sections, where 
section Lij is the jth section in fragment Mi. Sections contain cross references to 
other defined or undefined sections within the document fragment. 

Considering each document fragment as an MTjgiK compositional module, two 
ways in which they can be usefully put together are described in the figure using 
the MT^ scripting language. The examples use a function named cl-project 
which projects sections corresponding to the closure of self-references within a 
module. This function can be written using the module primitive project and 
an introspective primitive self-refs-in, which returns the self- referenced names 
within a section. The expression in Figure 7(a) merges (closures of) particular 
sections projected from each of the modules, producing a document containing 
several sections at the same level. The expression in Figure 7(b) creates a new 
document module and nests within it one subsection per original module that 
contains (closures of) particular sections projected from each of the original 
modules. 



4 The Etyma Framework and its Completions 



Earlier, it was mentioned that tools for systems based on CM can be constructed 
from a common architecture that encompasses the concepts of CM. In this 
section, we describe a simple software architecture, an GO framework named 
Etyma^, that can be effectively reused to build tools for a wide variety of sys- 
tems based on CM such as the ones described in the previous section. Tools 
constructed from this framework benefit not only from the power and flexibil- 
ity that the underlying model offers, but also from significant design and code 
reuse. Thus, Etyma could significantly reduce the resources spent in developing 
tools, as well as increase their reliability. Furthermore, Etyma represents a good 
model for studying the domain of systems based on CM. 

A tool for a system based on CM can be said to consist of a front-end that 
reads in command and data input, a processing engine that performs CM oper- 
ations on an internal representation (IR), and an optional back-end that trans- 
forms the IR into some external representation. The Etyma framework is in- 
tended for constructing the processing engine along with the IR, rather than for 
building the front- and back-ends to such systems. 

Etyma is implemented in the C-|-+ language[14]. It is continually evolving, 
but currently consists of about 45 reusable classes, cind approximately 7,000 lines 
of C-I-+ code. The C-h+ realization of the Etyma framework has undergone 
several iterations over almost two years. In Section 4.3, we outline the major 
evolutionary stages of the framework. 

4.1 Structure of Abstractions 

Compositional modularity deals with modules, their instances, the attributes 
they are composed of, and the types of all the above. Thus, the primary concepts 
that must be captured by a reusable architecture for CM such as Etyma are 
those of modules, instances, names, values, methods, variables, and their corre- 
sponding types. However, Etyma is also a linguistic framework, i.e., a framework 
from which language processing tools will be designed. Thus, while modeling the 
above concepts, we must not inadvertently limit their generality. For example, a 
method is a specialization of the general concept of a function. Similarly, the con- 
cept of a record is closely related to that of a module and an instance. We must 
also be careful in determining the precise relationships between concepts. For 
example, a module is a record generator whereas an instance is itself a record; 
thus, the concept of an instance is a subtype of the concept of a record, but 
neither of these concepts is subtype-related to the concept of a module. 

The abstractions of Etyma form two layers. An abstract layer consists of ab- 
stract class realizations (partial implementations) of the concepts given above. 
These classes may be used as a "white box" framework (via inheritance) by com- 
pletions. A concrete layer provides full implementations of the abstract classes 

^ et.y.mon (pi. et.y.ma also etymons) [L, fr. Gk] ... 2: a word or morpheme from which 
words are formed by composition or derivation. — Webster Dictionary 
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Fig. 8. An overview of the abstract classes of Etyma. 



that can be directly used as a "black box" framework (via instantiation) by 
completions. This layer, as customary, is meant to increase the reusability of the 
framework. Only, the important claisses in both layers, i.e., those corresponding 
to modules, instances, and methods, are described in more detail below. (For 
brevity, we omit classes corresponding to the type system.) We utilize the no- 
tion of design patterns [16] to elucidate the structure of the Etyma framework. 

Ahsiraci classes. Figure 8 shows an overview of the abstract classes of Etyma 
diagrammed using the 00 notation in [16] extended to show protected meth- 
ods. Cl£tss Etymon is the abstract base class of all classes in Etyma, and classes 
Typed VaJue and Type represent the domains of values and their types respec- 
tively. 

The abstract class Module captures the notion of a compositional module in 
its broadest conception. Its public methods correspond to the module operators 
introduced earlier. Within this class, no concrete representation for module at- 
tributes is assumed. Instead, the public module operations are implemented as 
template method patterns in terms of a set of protected abstract methods such 
as insert, remove, etc. which manage module attributes. Concrete subclasses of 
Module are expected to provide implementations for these abstract protected 
methods. Two of these are abstract factory method patterns: create_instance, 
which is expected to return an instance of a concrete subclass of class Instance 
(below), and create_iter, which is expected to return an instance of a concrete 
subclass of class Attrlter, an iterator pattern for module attributes. Thus, the 



generality of class Module results from its use of a combination of the following 
patterns: template method, abstract factory method, and iterator. 

Class Instance is a subclass of Record; hence it supports record operations, 
implemented in a manner similar to those of class Module, In addition, it mod- 
els the traditional 00 notion of sending a message (dispatch) to an object as 
seJec t'ing a met hod- valued attribute followed by invoking apply on it. This func- 
tionality is encapsulated by a template method pattern nnsg-send(Labe!,Args). 
Furthermore, class Instance has access to its generating module via its module 
data member. 

The concept of a method is modeled as a specialization of the concept of a 
function. Class Function supports an apply method that evaluates the function 
body. Although class Function is a concrete class, the function body is represented 
by an abstract class ExprNodCy a composite pattern. Since a method "belongs 
to" a class, class Method requires that the first argument to its apply method is 
an instance of class Instance, corresponding to its notion of self. 

Concrete classes. Some abstract classes in Figure 8 are subclassed into concrete 
classes to facilitate immediate reuse. Class Std Module is a concrete subclass 
of Module that represents its attributes as a map. An attribute map (object of 
class AttrMap) is a collection of individual attributes, each of which maps a name 
(object of clciss Label) to a binding (object of class AttrValue). A binding encap- 
sulates an object of any subclass of Typed VaJue. This structure corresponds to a 
variation of the bridge pattern, which makes it possible for completions to reuse 
much of the implementation of class Module by simply implementing classes 
corresponding to attribute bindings as subclasses of Typed VaJue. 

Each of Std Module's attribute management functions is implemented as the 
corresponding operations on the map. Furthermore, the factory method pattern 
createJter of StdModule returns an object of a concrete subclass of class AttrlteFy 
class Std Attr Iter. Similarly, the factory method pattern create.instance returns 
an object of the concrete subclass of class Instance, class Std Instance. Class 
Std Instance itself is also implemented using attribute maps. 

4.2 Completion Construction 

As mentioned earlier, Etyma can be used to construct the processing engines of 
tools for compositionally modular systems. In practice, one must first identify the 
various kinds of name bindings comprising namespaces in the system. One can 
then identify generalizations of these concepts specified as classes in the Etyma 
framework. For each such general Etyma class, one must then subclass it to 
implement the more specific concept in the system. Once this is done, concrete 
classes in the framework corresponding to modules, instances, and interfaces can 
usually be almost completely reused, due to the bridge pattern mentioned above. 

Architecturally, tools constructed as completions of Etyma have the ba- 
sic structure given in Figure 9. The command input component reads in mod- 
ule manipulation programs that direct the composition engine. The data input 
component creates the internal representation (IR) of compositional modules 
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Fig. 9. Architecture of completions. 



by parsing module source data and instantiating appropriate framework and 
completion classes. The optional data output component transforms IR into a 
suitable output format. The composition engine itself is derived from the Etyma 
framework, and comprises classes (data and composition behavior) correspond- 
ing to module related entities. In the following subsections, three tools derived 
from the Etyma framework in this manner are described. 



An Interpreter for CMS The CMS interpreter consists of two parts: a basic 
Scheme interpreter written in the C language, and the module system, imple- 
mented as a completion of Etyma. The basic Scheme interpreter itself was 
extracted from a publicly available scriptable windowing toolkit called STk [15]. 
The interpreter implementation exports many of the functions implementing 
Scheme semantics, thus making it easy to access its internals. Furthermore, the 
interpreter was originally designed to be extensible, i.e., new Scheme primitives 
can be implemented in C/C-l-4- and easily incorporated into the interpreter. 
Thus, in order to implement CMS, Scheme primitives implementing concepts of 
compositional modularity such as mk-module, mk-instance, self-ref, merge, etc. 
were implemented in C+-f and incorporated into the interpreter. 

The class design for the CMS module system completion is as follows. At- 
tribute bindings within CMS modules can be Scheme values, variables, or meth- 
ods. These can be modeled as subclasses of framework classes Prim Value (not 
shown in Figure 8), Location, and Method respectively. The method subclass 
need not store the method body as a subclass of ExprNode; instead, it can sim- 
ply store the internal representation of the Scheme expression as exported by 
the interpreter implementation. Additionally, the method subclass must define 
methods corresponding to the CMS primitives self-ref, self-set!, etc. which call 
similar methods on the stored self object. 

With the classes mentioned above, the implementation of class Std Module can 
be almost completely reused for implementing CMS modules. However, methods 
to handle CMS primitives mk-module, mk-instance, etc. must be added. The only 
modification required is to redefine the method create.instance of class Std Module 
to return an object of an appropriate subclass of class Stdlnstance. This subclass 
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Table 1. Reuse of framework design and code for CMS interpreter. 



needs to implement code for CMS primitives such as attr-ref and attr-set!. 

Table 1 shows several measures of reuse for the CMS module system imple- 
mented as a completion of Etyma. The percentages for class and method reuse 
give an indication of design reuse, since classes and their methods represent the 
functional decomposition and interface design of the framework. On the other 
hand, the percentages for lines of code give a measure of code reuse. 

A Compiler Front- end for Compositional IDL Although we have not de- 
scribed the classes in Etyma that relate to types, there is a comprehensive set of 
reusable classes that correspond to the notions of interfaces, record types, func- 
tion types, etc. All type classes are subclasses of the abstract superclass Type 
in Figure 8 which defines abstract methods for type equality, subtyping, and for 
finding bounds of type pairs in a type lattice. Interfaces correspond to the types 
of modules, and support predicate methods that implement the typechecking 
rules for each of the module operators. Furthermore, abstract class Interface 
and its concrete subclass Std Interface are implemented in a manner similar to 
classes Module and Std Module. 

Briefly, the class design for the compositional IDL front-end completion are 
as follows. Attribute bindings within interfaces can be base types, function types, 
or record types, designed as subclasses of the corresponding generic classes in 
the framework. Define class IDLInterface as a subclass of class Stdlnterface, and 
define methods merge (IDLInterface), rename (Label, Label), etc. to return new 
interface objects after performing the appropriate operations. Furthermore, the 
notion of the recursive type selftype is implemented as the special framework class 
SelfType (a singleton pattern). Recursive type equality and subtyping methods of 
Stdlnterface, which implement the algorithms given in [1] can be reused directly 
in the IDLInterface class. Design and code reuse numbers for this completion 
prototype are given in Table 2. 

An Interpreter for The STk-derived Scheme interpreter was used for 

MTgX in a manner similar to CMS. The subclasses of Etyma created to con- 
struct the MTJgX module engine are: TexLabel of Label, Section of Method, Tex- 
Module of StdModule, SecMap of AttrMap, and Texinterface of Stdlnterface. Also, 
a new class Segment that represents a segment of text between self-references 
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Table 3. Reuse of framework design and code for building M1^. 



was created. Approximate design and code reuse numbers for the MTJgjX imple- 
mentation are shown in Table 3. 

4,3 Framework Evolution 

The very first version of Etyma was almost fully concrete, and was designed 
to experiment with a module extension to the C language. It consisted only of 
the notions of modules, instances, primitive values, and locations, along with a 
few support classes. No front and back ends were constructed. The next incar- 
nation of Etyma was used to build a typechecking mechanism for C language 
object modules, described in [4]. This experiment solidified many of the type 
classes of Etyma. However, at this point, Etyma was still primarily a set of 
concrete classes. The third incarnation was used to direct the reengineering of 
the programmable linker /loader OMOS described earlier. In this iteration, the 
framework was not directly used in the construction of OMOS (due to practical, 
not technical constraints), but it evolved in parallel with the actual class hierar- 
chy of OMOS. The design of OMOS classes follow that of Etyma closely. Also, 
much of the framework, including the abstract and concrete layers, developed 
during this iteration. 

The fourth iteration over Etyma was the construction of CMS completion. 
There were few changes to the framework classes in this iteration; these were 
mostly to fix implementation bugs. However, some new methods for retroactive 
nesting were added. Nonetheless, the CMS interpreter Wcis constructed within 
a very short period of time, and resulted in a high degree of reuse. The next 
iteration was to design and implement an IDL compiler front-end. There were 
almost no modifications to the framework; additions included selftype related 



code. The sixth, and most recent, iteration over Etyma has been to build the 
MTeX document composition system. There were no changes to the framework. 

The first three iterations essentially evolved the fraimework from a set of con- 
crete classes to a reusable set of abstract and concrete classes, thus crystallizing 
the reusable functionality of the framework. From the fourth iteration onwards, 
the framework was mostly reused, with some additions, but very few modifi- 
cations. As the observed reusability of the framework increased, measurements 
were taken to record the reuse achieved, as shown in the tables earlier. 

5 Conclusions and Future Work 

We have shown in this paper that 00 class inheritance viewed as operations 
on self-referential namespaces is a broadly applicable concept. Specifically, we 
have shown how to apply compositional modularity (CM), a model that de- 
fines a comprehensive suite of operations on modules viewed as self-referential 
namespaces, to a variety of software artifacts such as Scheme language modules, 
compiled object files, interfaces, and document fragments. 

We have described four tools that can help eff'ectively manage these software 
artifacts. These are: (i) an interpreter for the programming language Scheme 
extended with the notion of compositional modules, (ii) a linker that manipulates 
compiled object files as compositional modules, (iii) a compiler front-end for a 
language with compositional interfaces, and (iv) a document processing system 
that manipulates documents as compositional modules. Furthermore, we show 
that these systems benefit significantly by incorporating concepts of module 
composition (i.e., class inheritance). 

The implementation of tools for systems based on CM share a lot in common. 
Hence, we argue that a reusable software architecture for such tools is beneficial. 
We describe a reusable 00 framework named Etyma from which tools such as 
the above can be efficiently constructed. Etyma currently comprises about 45 
reusable C-|— h classes in 7000 lines that evolved over six iterations. Three of the 
above tools were built by directly reusing Etyma, resulting in significant levels 
(between 73 and 91%) of design and code reuse. 

Many other tools can be based on CM and can be built by completing 
Etyma. Naturally, CM can be applied within other programming language 
processors: compiler and interpreters for modular and non-modular languages. 
There is also an abundance of software artifacts that can be viewed as self- 
referential namespaces, and that have a useful notion of composition. For ex- 
ample, tools that manage GUI components viewed as compositional entities are 
conceivable. File systems that view directories as self-referential namespaces (i.e., 
filenames bound to file contents that refer back to other filenames) could also 
be useful. We also speculate that the commonality of the underlying models of 
such tools can be exploited for supporting interoperability among them. 
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